7 research outputs found
UniVG: Towards UNIfied-modal Video Generation
Diffusion based video generation has received extensive attention and
achieved considerable success within both the academic and industrial
communities. However, current efforts are mainly concentrated on
single-objective or single-task video generation, such as generation driven by
text, by image, or by a combination of text and image. This cannot fully meet
the needs of real-world application scenarios, as users are likely to input
images and text conditions in a flexible manner, either individually or in
combination. To address this, we propose a Unified-modal Video Genearation
system that is capable of handling multiple video generation tasks across text
and image modalities. To this end, we revisit the various video generation
tasks within our system from the perspective of generative freedom, and
classify them into high-freedom and low-freedom video generation categories.
For high-freedom video generation, we employ Multi-condition Cross Attention to
generate videos that align with the semantics of the input images or text. For
low-freedom video generation, we introduce Biased Gaussian Noise to replace the
pure random Gaussian Noise, which helps to better preserve the content of the
input conditions. Our method achieves the lowest Fr\'echet Video Distance (FVD)
on the public academic benchmark MSR-VTT, surpasses the current open-source
methods in human evaluations, and is on par with the current close-source
method Gen2. For more samples, visit https://univg-baidu.github.io
Generative Graph Convolutional Network for Growing Graphs
Modeling generative process of growing graphs has wide applications in social
networks and recommendation systems, where cold start problem leads to new
nodes isolated from existing graph. Despite the emerging literature in learning
graph representation and graph generation, most of them can not handle isolated
new nodes without nontrivial modifications. The challenge arises due to the
fact that learning to generate representations for nodes in observed graph
relies heavily on topological features, whereas for new nodes only node
attributes are available. Here we propose a unified generative graph
convolutional network that learns node representations for all nodes adaptively
in a generative model framework, by sampling graph generation sequences
constructed from observed graph data. We optimize over a variational lower
bound that consists of a graph reconstruction term and an adaptive
Kullback-Leibler divergence regularization term. We demonstrate the superior
performance of our approach on several benchmark citation network datasets
Towards Robust Off-Policy Learning for Runtime Uncertainty
Off-policy learning plays a pivotal role in optimizing and evaluating policies prior to the online deployment. However, during the real-time serving, we observe varieties of interventions and constraints that cause inconsistency between the online and offline setting, which we summarize and term as runtime uncertainty. Such uncertainty cannot be learned from the logged data due to its abnormality and rareness nature. To assert a certain level of robustness, we perturb the off-policy estimators along an adversarial direction in view of the runtime uncertainty. It allows the resulting estimators to be robust not only to observed but also unexpected runtime uncertainties. Leveraging this idea, we bring runtime-uncertainty robustness to three major off-policy learning methods: the inverse propensity score method, reward-model method, and doubly robust method. We theoretically justify the robustness of our methods to runtime uncertainty, and demonstrate their effectiveness using both the simulation and the real-world online experiments